Non-distributive Aggregate Functions Cs 764 : Advanced Database Project Report
نویسندگان
چکیده
The ability to eeciently compute multiple related group-bys is critical to On-Line Analytical Processing and multidimensional data analysis. The computation of CUBE, a special case of the aggregation problem, has been well studied 1]. However, to the best of our knowledge, previous work focused primarily on computing aggregates over distributive and algebraic functions 3]. In this project, we investigated the possibility of computing "Holistic" aggregate functions eeciently. We started with a straight forward, non-optimized approach and experimented alternatives in various aspects, from retaining more information to engineering for less accurate results. We concluded with evidence that, while hardly any optimization can be made in generating the exact result of holistic functions on the CUBE, an approximation method which we named bucket cube can achieve an accuracy of 85%-99% while incurring only moderately more overhead than computing the CUBE over common distributive functions.
منابع مشابه
Incremental Maintenance for Non-Distributive Aggregate Functions
Incremental view maintenance is a well-known topic that has been addressed in the literature as well as implemented in database products. Yet, incremental refresh has been studied in depth only for a subset of the aggregate functions. In this paper we propose a general in-cremental maintenance mechanism that applies to all aggregate functions, including those that are not distributive over all ...
متن کاملOptimization for Queries with Holistic Functions
The early grouping technique is a new method for optimizing aggregate queries. It provides more opportunities for the query optimizers to find optimal plans because all possible placements of the GROUP BY operators in the query trees are considered during the optimization process. Howeve1; to employ this technique, one of the requirements is that the aggregate function in the query must be dist...
متن کاملAutomatic Research Summaries in DBLife
The Cimple project on Community Information Management [4] is a project with the goal of developing a software platform for the effective management of data related to a given online community. The DBLife project [3] is a prototype system to intended to help test and extend the ideas of the Cimple project, focused specifically on the database research community. In DBLife, each researcher in th...
متن کاملProfiling the Resource Usage of OLTP Database Queries
This technical report contains eight final project reports contributed by ten participants in “Hot Topics in Database Systems,” a CMU advanced graduate course offered by Professor Anastassia Ailamaki in Fall 2002. The course covers advanced research issues in modern database system design through paper presentations and discussion. In Fall 2002, topics included query optimization, data stream a...
متن کاملThe Camelot Project
Camelot provides flexible and high performance transaction management, disk management, and recovery mechanisms that are useful for implementing a wide class of abstract data types, including large databases. To ensure that Camelot is accessible outside of the Carnegie Mellon environment, Camelot runs on the Unix-compatible Mach operating system and uses the standard Arpanet IP communication pr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007